Search CORE

International Migration, Integration and Social Cohesion online publications

UvA-DARE

A Dutch coreference resolution system with an evaluation on literary fiction

Author: Cranenburgh van, Andreas
Publication venue
Publication date: 18/12/2019
Field of study

A Dutch coreference resolution system with quote attribution

Author: van Cranenburgh Andreas
Publication venue
Publication date: 31/01/2019
Field of study

Annotation and Prediction of Movie Sentiment Arcs

Author: van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2022
Field of study

Some narratologists have argued that all stories derive from a limited set of archetypes. Specifically, Vonnegut (2005) claims in his Shapes of Stories lecture that if we graph the emotions in a story over time, the shape will be an instance of one of six basic story shapes. The work of Jockers (2015) and Reagan et al. (2016) purports to confirm this hypothesis empirically using automatic sentiment analysis (rather than manual annotations of story arcs) and algorithms to cluster story arcs into fundamental shapes. Later work has applied similartechniques to movies (Del Vecchio et al., 2019). This line of work has attracted criticism. Swafford (2015) argues that sentiment analysis needs to be validated on and adapted to narrative text. Enderle (2016) argues that the various methods to reduce story shapes to the putative six fundamental types are actually producing algorithmic artifacts, and that random sentiment arcs can also be clustered into six “fundamental” shapes.In this paper I will not attempt to find fundamental (or even universal) story shapes, but I will take the observed story shape for each narrative as is, without trying to cluster them into archetypes. My aim is to perform an empirical validation of how well basic sentiment analysis tools can reproduce a sentiment arc obtained through manual annotation based on narrative text. Rather than considering novels as narratives, I consider movies, since the annotation ofmovies, when done in real time, is less time consuming. In a previous abstract, I considered the task of predicting the annotated sentiment of individual sentences from movie scripts (van Cranenburgh, 2020), and concluded that sentiment analysis tools achieve comparable performance on narrative text as compared to reviews and social media text (pace Swafford 2015). In this abstract I consider the task of predicting the overall sentiment as annotated based on watching the movie. This task is more challenging since the connection between the narrative sentiment and the narrative text is potentially more distant

A Dutch coreference resolution system with quote attribution

Author: van Cranenburgh Andreas
Publication venue
Publication date: 31/01/2019
Field of study

An Empirical Evaluation of Sentiment Analysis on Movie Scripts

Author: Cranenburgh van, Andreas
Publication venue
Publication date: 01/01/2020
Field of study

An evaluation of simple sentiment analysis methods on movie scripts

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ZENODO

Machine Learning Literature using Textual Features

Author: van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2016
Field of study

Annotation and Prediction of Movie Sentiment Arcs

Author: van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2022
Field of study

Machine Learning Literature using Textual Features

Author: van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2016
Field of study

Machine Learning Literature using Textual Features

Author: van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2016
Field of study

Literature is hard to define. The value-judgment definition holds that literature is a highly valued kind of writing [2, p. 9], but how arbitrary or predictable are such judgments? Moreover, some believe that critics and publishers wield more influence than the text itself [1]. We investigate these questions with a computational model of literature trained on texts. As part of The Riddle of Literary Quality (http://literaryquality.huygens.knaw.nl), an online survey (14k respondents) was conducted among the general public to collect judgments on 401 recent, bestselling Dutch novels. Given a list of author-title pairs, respondents rated novels they had read on a 7-point scale from definitely not to highly literary. We consider the regression task of predicting the mean rating of each novel using features extracted from its text. We train a linear support vector regression model on frequencies of bigrams and syntactic features. The syntactic features consist of tree fragments mined from trees obtained by automatically parsing the novels. Our predictive model explains 57.5 % of the variance in literary ratings, with a root mean squared error of 0.65 on a scale of 0–7 (evaluation based on 5-fold cross-validation with the 401 novels).This is in line with pilot experiments with a subset of the novels and only bigrams [3]. Although the bigrams form a simple, strong baseline, the syntactic features are more interpretable. We conclude that perceptions of literary ratings can be explained to a large extent from the text itself: there is an intrinsic literariness to literary texts